39 research outputs found

    Parallélisme des nids de boucles pour l’optimisation du temps d’exécution et de la taille du code

    Get PDF
    The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with non-optimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping”Les algorithmes des systèmes temps réels incluent de plus en plus de nids de boucles, qui sont caractérisés par un temps d’exécution important. De ce fait, plusieurs démarches de parallélisme des boucles imbriquées ont été proposées dans l’objectif de réduire leurs temps d’exécution. Ces démarches peuvent être classifiées selon deux niveaux de granularité : le parallélisme au niveau des itérations et le parallélisme au niveau des instructions. Dans le cas du deuxième niveau de granularité, les techniques visent à atteindre un parallélisme total des instructions appartenant à une même itération. Cependant, le parallélisme est contraint par les dépendances des données inter-itérations ce qui implique le décalage des instructions à travers les boucles imbriquées, provocant ainsi une augmentation du code proportionnelle au niveau du parallélisme. Par conséquent, le parallélisme total au niveau des instructions des nids de boucles engendre des implémentations avec des temps d’exécution non-optimaux et des tailles du code importantes. Les travaux de cette thèse s’intéressent à l’amélioration des stratégies de parallélisme des nids de boucles. Une première contribution consiste à proposer une nouvelle technique de parallélisme au niveau des instructions baptisée « retiming multidimensionnel décalé ». Elle vise à ordonnancer les nids de boucles avec une période de cycle minimale, sans atteindre un parallélisme total. Une deuxième contribution consiste à mettre en pratique notre technique dans le contexte de l’implémentation temps réel embarquée des nids de boucles. L’objectif est de respecter la contrainte du temps d’exécution tout en utilisant un code de taille minimale. Dans ce contexte, nous avons proposé une première démarche d’optimisation qui consiste à utiliser notre technique pour déterminer le niveau parallélisme minimal. Par la suite, nous avons décrit une deuxième démarche permettant de combiner les parallélismes au niveau des instructions et au niveau des itérations, en utilisant notre technique et le « loop striping

    Mobile Aided System of Deep-Learning Based Cataract Grading from Fundus Images

    No full text
    The cataract is an ocular disease which requires early detection to avoid reaching a higher severity level. However, a worldwide deficiency of ophthalmologists and medical imaging devices is registered, which prevents early cataract detection. Our main objective is to propose a high performance method of cataract grading with a lower computational processing to be suitable for mobile devices. The main contribution consists in extracting features through a transfer-learned and fine-tuned MobileNet-V2 model, and deducing the cataract grade using a random forest classifier. The evaluation is conducted using a dataset of 590 fundus images, where 91.43% sensitivity, 89.58% specificity, 90.68% accuracy and 92.75% precision are achieved. In addition, the method implemented into a smartphone requires an average execution time of 1.41 second. The method implementation as an app into a smartphone associated to an optical lens for retina capturing, presents a mobile-aided-grading system that facilitates diagnosing the cataract disease

    Cataract grading method based on deep convolutional neural networks and stacking ensemble learning

    No full text
    International audiencePurpose The cataract is the most common cause of severe vision impairment or blindness worldwide. It is essential to periodically diagnose the retina in order to prevent cataract severity, and so to enhance the life quality of cataract-affected patients. Cataract grading through a fundus image is feasible with higher accuracy. However, a delay of early cataract screening is registered caused by deficiency of ophthalmologists and imaging devices. The challenge is to propose a CAD system to grade the cataract from retinal images. Method In this paper, an ensemble learning framework for cataract grading is put forward, where three convolutional deep neural networks are stacked in order to provide higher performance grading. The main contributions of this work are given as follows: (1) Preprocessing and data augmentation of fundus images are performed to ensure the robustness of the cataract grading; (2) The well-known DL architectures (Inception-V3, MobileNet-V2 and NasNet-Mobile) are fine-tuned and learned as base classifiers; (3) A stacking method is propounded to combine the features of base classifiers. Results The evaluation is conducted using a dataset of 590 fundus images selected from two public databases. The suggested framework achieves 93.97% accuracy, 95.59% sensitivity, 91.67% specificity, 94.20% precision and 94.89% F-measure for cataract grading. Conclusion The proposed framework successfully grades fundus images into cataract severity. Moreover, stacking ensemble learning allows achieving a performance that significantly surpasses the ones realized by each DL architecture, applied separately

    Nested loop parallelism to optimize execution time and code size

    No full text
    Les algorithmes des systèmes temps réels incluent de plus en plus de nids de boucles, qui sont caractérisés par un temps d’exécution important. De ce fait, plusieurs démarches de parallélisme des boucles imbriquées ont été proposées dans l’objectif de réduire leurs temps d’exécution. Ces démarches peuvent être classifiées selon deux niveaux de granularité : le parallélisme au niveau des itérations et le parallélisme au niveau des instructions. Dans le cas du deuxième niveau de granularité, les techniques visent à atteindre un parallélisme total des instructions appartenant à une même itération. Cependant, le parallélisme est contraint par les dépendances des données inter-itérations ce qui implique le décalage des instructions à travers les boucles imbriquées, provocant ainsi une augmentation du code proportionnelle au niveau du parallélisme. Par conséquent, le parallélisme total au niveau des instructions des nids de boucles engendre des implémentations avec des temps d’exécution non-optimaux et des tailles du code importantes. Les travaux de cette thèse s’intéressent à l’amélioration des stratégies de parallélisme des nids de boucles. Une première contribution consiste à proposer une nouvelle technique de parallélisme au niveau des instructions baptisée « retiming multidimensionnel décalé ». Elle vise à ordonnancer les nids de boucles avec une période de cycle minimale, sans atteindre un parallélisme total. Une deuxième contribution consiste à mettre en pratique notre technique dans le contexte de l’implémentation temps réel embarquée des nids de boucles. L’objectif est de respecter la contrainte du temps d’exécution tout en utilisant un code de taille minimale. Dans ce contexte, nous avons proposé une première démarche d’optimisation qui consiste à utiliser notre technique pour déterminer le niveau parallélisme minimal. Par la suite, nous avons décrit une deuxième démarche permettant de combiner les parallélismes au niveau des instructions et au niveau des itérations, en utilisant notre technique et le « loop striping »The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instruction level parallelism, the techniques aim to achieve a full parallelism. However, the loop carried dependencies implies shifting instructions in both side of nested loops. Consequently, these techniques provide implementations with non-optimal execution times and important code sizes, which represent limiting factors when implemented on embedded real-time systems. In this work, we are interested on enhancing the parallelism strategies of nested loops. The first contribution consists of purposing a novel instruction level parallelism technique, called “delayed multidimensional retiming”. It aims to scheduling the nested loops with the minimal cycle period, without achieving a full parallelism. The second contribution consists of employing the “delayed multidimensional retiming” when providing nested loop implementations on real time embedded systems. The aim is to respect an execution time constraint while using minimal code size. In this context, we proposed a first approach that selects the minimal instruction parallelism level allowing the execution time constraint respect. The second approach employs both instruction level parallelism and iteration level parallelism, by using the “delayed multidimensional retiming” and the “loop striping

    Computationally Efficient Blood Vessels Segmentation in Fundus Image on Shared Memory Parallel Machines

    No full text
    International audienceBlood vessels segmentation in fundus image is a requiring step in order to detect retinopathies. A higher performing segmentation was been proposed in [12]. It consists at three dependent stages: Provide two binary images to extract wide vessels, compute features of the remaining pixels on binary images in order to extract fine vessels, and then combine both wide and fine vessels. The segmentation execution time is about 3-12 seconds when it is performed with fundus image having resolutions between 768*584 and 999*960. These latest resolutions are quite smaller than ones provided by actual retinograph, which leads to a higher rise on execution time. In this paper, we propose a parallelism strategy of the segmentation approach for implementation in Shared Memory Parallel Machine (SMPM). First, both binary images are provided in parallel. Thereafter, features processing is split according to their computational complexities. At the later stage, wide vessels and fine vessels images are subdivided adequately in the objective of a parallel combination. The parallel strategy is implemented using OpenCV and then assessed on STARE public data sets. Experimental analyses of execution time and efficiency are presented and discussed

    Detection of retinal abnormalities using Smartphone-captured fundus images: A survey

    No full text
    International audienceSeveral retinal pathologies lead to severe damages that may achieve vision lost. Moreover, some damages require expensive treatment, other ones are irreversible due to the lack of therapies. Therefore, early diagnoses are highly recommended to control ocular diseases. However, early stages of several ocular pathologies lead to the symptoms that cannot be distinguish by the patients. Moreover, ageing population is an important prevalence factor of ocular diseases which is the cases of most industrial counties. Further, this feature involves a lake of mobility which presents a limiting factor to perform periodical eye screening. Those constraints lead to a late of ocular diagnosis and hence important ocular pathology patients are registered. The forecast statistics indicates that affected population will be increased in coming years. Several devices allowing the capture of the retina have recently been proposed. They are composed by optical lenses which can be snapped on Smartphone, providing fundus images with acceptable quality. Thence, the challenge is to perform automatic ocular pathology detection on Smartphone captured fundus images that achieves higher performance detection while respecting timing constraint with respect to the clinical employment. This paper presents a survey of the Smartphone-captured fundus image quality and the existing methods that use them for retinal structures and abnormalities detection. For this purpose, we first summarize the works that evaluate the Smartphone-captures fundus image quality and their FOV (field-of-view). Then, we report the capability to detect abnormalities and ocular pathologies from those fundus images. Thereafter, we propose a flowchart of processing pipeline of detecting methods from Smartphone captured fundus images and we investigate about the implementation environment required to perform the detection of retinal abnormalities

    Ocular Diseases Diagnosis in Fundus Images using a Deep Learning: Approaches, tools and Performance evaluation

    No full text
    International audienceOcular pathology detection from fundus images presents an important challenge on health care. In fact, each pathology has different severity stages that may be deduced by verifying the existence of specific lesions. Each lesion is characterized by morphological features. Moreover, several lesions of different pathologies have similar features. We note that patient may be affected simultaneously by several pathologies. Consequently, the ocular pathology detection presents a multi-class classification with a complex resolution principle. Several detection methods of ocular pathologies from fundus images have been proposed. The methods based on deep learning are distinguished by higher performance detection, due to their capability to configure the network with respect to the detection objective. This work proposes a survey of ocular pathology detection methods based on deep learning. First, we study the existing methods either for lesion segmentation or pathology classification. Afterwards, we extract the principle steps of processing and we analyze the proposed neural network structures. Subsequently, we identify the hardware and software environment required to employ the deep learning architecture. Thereafter, we investigate about the experimentation principles involved to evaluate the methods and the databases used either for training and testing phases. The detection performance ratios and execution times are also reported and discussed

    Execution Time and Code Size Optimization using Multidimensional Retiming and Loop Striping

    No full text
    International audience— Nested loops present the most critical sections in several embedded real-time applications. To achieve a higher performance, the design process employs an optimization technique in order to increase parallelism. However, the nested loop codes rise greatly in terms of parallelism level. Due to tight execution time constraints, each optimization technique produces implementations with an important code size. This criterion presents a limiting factor to implement the provided results in embedded real-time systems. In this paper, we propose a novel optimization approach that combines the delayed multidimensional retiming and loop striping techniques. It explores the solution space, which is composed by all parallelism cases proposed by both techniques, in order to provide the implementation that achieves the execution time constraint while uses a lower code size. In this context, we present the theory of combining both techniques. Then, we propose efficient algorithms that ensure selecting a set of parallelism transformations, based on their execution time and code size evolutions. The experimental results show that our optimization approach provides optimal solutions compared to those provided by applying only one technique. It achieves average improvements on the code size of 35.21% compared to the delayed multidimensional retiming and 16.38% compared to the loop striping
    corecore